首页> 外文OA文献 >CNN-MERP: An FPGA-Based Memory-Efficient Reconfigurable Processor for Forward and Backward Propagation of Convolutional Neural Networks
【2h】

CNN-MERP: An FPGA-Based Memory-Efficient Reconfigurable Processor for Forward and Backward Propagation of Convolutional Neural Networks

机译:CNN-mERp:基于FpGa的内存高效可重配置处理器   卷积神经网络的前向和后向传播

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Large-scale deep convolutional neural networks (CNNs) are widely used inmachine learning applications. While CNNs involve huge complexity, VLSI (ASICand FPGA) chips that deliver high-density integration of computationalresources are regarded as a promising platform for CNN's implementation. Atmassive parallelism of computational units, however, the external memorybandwidth, which is constrained by the pin count of the VLSI chip, becomes thesystem bottleneck. Moreover, VLSI solutions are usually regarded as a lack ofthe flexibility to be reconfigured for the various parameters of CNNs. Thispaper presents CNN-MERP to address these issues. CNN-MERP incorporates anefficient memory hierarchy that significantly reduces the bandwidthrequirements from multiple optimizations including on/off-chip data allocation,data flow optimization and data reuse. The proposed 2-level reconfigurabilityis utilized to enable fast and efficient reconfiguration, which is based on thecontrol logic and the multiboot feature of FPGA. As a result, an externalmemory bandwidth requirement of 1.94MB/GFlop is achieved, which is 55% lowerthan prior arts. Under limited DRAM bandwidth, a system throughput of1244GFlop/s is achieved at the Vertex UltraScale platform, which is 5.48 timeshigher than the state-of-the-art FPGA implementations.
机译:大规模深度卷积神经网络(CNN)广泛用于机器学习应用程序。尽管CNN涉及到巨大的复杂性,但提供高密度集成计算资源的VLSI(ASICand FPGA)芯片被认为是CNN实施的有希望的平台。但是,由于计算单元的大规模并行性,受VLSI芯片引脚数限制的外部存储器带宽成为系统瓶颈。此外,VLSI解决方案通常被认为缺乏针对CNN的各种参数进行重新配置的灵活性。本文介绍了CNN-MERP,以解决这些问题。 CNN-MERP合并了一个高效的内存层次结构,该层次结构显着减少了多项优化对带宽的需求,包括片上/片外数据分配,数据流优化和数据重用。所提出的2级可重配置性用于基于FPGA的控制逻辑和多启动功能实现快速有效的重配置。结果,实现了1.94MB / GFlop的外部存储器带宽需求,这比现有技术低了55%。在有限的DRAM带宽下,在Vertex UltraScale平台上实现了1244GFlop / s的系统吞吐量,这是最新的FPGA实现的5.48倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号